Date: 19 November 2016
This report explores game rating from IGN. Summary and structure of dataset are as follows.
## X score_phrase
## Min. : 0 Great :4772
## 1st Qu.: 4657 Good :4741
## Median : 9312 Okay :2945
## Mean : 9312 Mediocre:1959
## 3rd Qu.:13968 Amazing :1804
## Max. :18624 Bad :1269
## (Other) :1134
## title
## Cars : 10
## Madden NFL 07 : 10
## Open Season : 10
## Brain Challenge : 9
## LEGO Star Wars II: The Original Trilogy: 9
## Madden NFL 08 : 9
## (Other) :18567
## url
## /games/aladdin/gba-566703 : 2
## /games/big-league-sports/wii-14275098 : 2
## /games/blur/xbox-360-14222096 : 2
## /games/call-of-duty-modern-warfare-2/ps3-2550: 2
## /games/crash-twinsanity/ps2-667247 : 2
## /games/defiance/pc-71832 : 2
## (Other) :18612
## platform score genre editors_choice
## PC :3370 Min. : 0.50 Action :3797 N:15107
## PlayStation 2:1686 1st Qu.: 6.00 Sports :1916 Y: 3517
## Xbox 360 :1630 Median : 7.30 Shooter :1610
## Wii :1366 Mean : 6.95 Racing :1228
## PlayStation 3:1356 3rd Qu.: 8.20 Adventure:1174
## Nintendo DS :1045 Max. :10.00 Strategy :1071
## (Other) :8171 (Other) :7828
## release_year release_month release_day
## Min. :1996 Min. : 1.000 Min. : 1.0
## 1st Qu.:2003 1st Qu.: 4.000 1st Qu.: 8.0
## Median :2007 Median : 8.000 Median :16.0
## Mean :2007 Mean : 7.139 Mean :15.6
## 3rd Qu.:2010 3rd Qu.:10.000 3rd Qu.:23.0
## Max. :2016 Max. :12.000 Max. :31.0
##
## 'data.frame': 18624 obs. of 11 variables:
## $ X : int 0 1 2 3 4 5 6 7 8 9 ...
## $ score_phrase : Factor w/ 11 levels "Amazing","Awful",..: 1 1 6 6 6 5 2 1 2 5 ...
## $ title : Factor w/ 12589 levels ".deTuned",".hack//G.U. Vol. 1: Rebirth",..: 5702 5703 9767 7249 7249 11405 2908 4446 2908 11405 ...
## $ url : Factor w/ 18577 levels "/games/0-d-beat-drop/xbox-360-14342395",..: 8390 8387 14319 10813 10812 16931 4271 6526 4270 16932 ...
## $ platform : Factor w/ 59 levels "Android","Arcade",..: 39 39 15 58 36 20 58 33 36 33 ...
## $ score : num 9 9 8.5 8.5 8.5 7 3 9 3 7 ...
## $ genre : Factor w/ 113 levels "","Action","Action, Adventure",..: 65 65 70 95 95 106 39 83 39 106 ...
## $ editors_choice: Factor w/ 2 levels "N","Y": 2 2 1 1 1 1 1 2 1 1 ...
## $ release_year : int 2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 ...
## $ release_month : int 9 9 9 9 9 9 9 9 9 9 ...
## $ release_day : int 12 12 12 11 11 11 11 11 11 11 ...
In this section, I will plot many histograms see the distribution of each feature.
Distribution of score_phase is lighly left-tailed. Next I will plot the distribution of score which I expect the similar distribution.
Score and score_phase distributions are similar as expected.
Most of the games are not picked as the editor’s choice.
Above 2 plots show score distribution of game with and without editor’s choice.
Total number of games in each year increase since 1997 to 2008 and decline after that.
Plot shows market share of 10 most popular platform. PC is the winner. It has approximately the same share as all playstations combine together.
Action is the most popular genre of all time.
There are 18625 entries in this dataset with 11 features (X, title, url, score_phrase, score, platform, genre, editors_choice, release_year, release_month, release_day). X is just the index while title and url are specific to each game. I will not include these three features in the analysis. Release_year, release_month and release_day can be combined into one single feature called release_date. There is one factor feature that I order it myself, namely score_phrase. The levels are as follows.
Disaster < Unbearable < Painful < Awful < Bad < Mediocre < Okay < Good < Great < Amazing < Masterpiece
I am interested in score, genre and platform. I would like to examine which platform should a gamer buy such that he/she can play a lot of high quality games.
Release_date will support the investigation in determining time development of game and platform. While editors_choice will help me filter high quality games.
Yes, I created release_date by combining release_year, release_month and release_day.
game$release_date <- game$release_year +
(game$release_month-1)/12 +
(game$release_day-1)/(12*31)
Most of the distributions are lightly skewed, so no transformation is required here.
This plot shows count of game for each popular platform. It is now hard to make a comparison between years because total number of game going up and down throughout the years. In the next plot I will make y-axis percent of game instead of count to make easier comparison.
We can now see that PC is the most consistent platform in term of game number.
Red line is the average score. It tends to increase over time.
Release day is not a contributing factor to score.
Solid line is median and dashed line are first and ninth quantile. Variation in score throughout 12 months is 0.5 on average.
## # A tibble: 12 × 2
## release_month score_median
## <int> <dbl>
## 1 1 7.0
## 2 2 7.5
## 3 3 7.4
## 4 4 7.3
## 5 5 7.1
## 6 6 7.2
## 7 7 7.1
## 8 8 7.5
## 9 9 7.6
## 10 10 7.5
## 11 11 7.3
## 12 12 7.0
The count is again hard to compare because total game number is changing. I then create another plot with percent count.
Action has around 20% market share throughout the years. Other genres rise and fall alternately. The following table shows top-score genre with more than 100 games.
## # A tibble: 6 × 3
## genre genre_median_score number_of_game
## <fctr> <dbl> <int>
## 1 RPG 7.9 980
## 2 Action, Adventure 7.7 765
## 3 Action, RPG 7.7 330
## 4 Fighting 7.5 547
## 5 Platformer 7.5 823
## 6 Puzzle 7.5 776
Box-plot shows summary of score distribution of popular genre. There are not significantly different from each other.
Number of platforms tends to increase over time. Next I will investigate the behavior of editor over time.
Editor tends to pick more games lately.
High score game can be both in and out of editor’s choice.
While some platforms have a good amount of games in recent year (ex. iPhone) and some are more popular in old day, PC and PlayStation Series always have consistent number of games throughout the period of interest. Game score does not depend on day release but it tends to increase slightly with year. The best month to release a game is September. The worst are January and December.
Percentage of editors_choice’s game tends to increase with time. This is related to the uptrend in score with time. Number of gaming platform is increasing. Some of the top score games (score > 8) are not picked as the editors_choice, this suggest that the editors must have other criteria in picking their choice.
Overall gaming standard is inflating, ie. higher score, more platform, more editor’s choice game.
This plot is quite hard to interprete. Quality of games for each platform fluctuates over time.
This shows the continuity of play station series.
Some genres such as RPG evolve over time in term of quality.
Most of the mean score in popular genre group are consistent except “Action, adventure” which is on decline from 1995 to 2007 and “RPG” which was rising rapidly in the period around 1995 to 1998.
Among PlayStation series, after the new version is released (ie. PlayStation 2,3,4), games for the old version (ie. PlayStation 1,2,3) usually perform better!
Total number of games was rising from 1997 to 2008 and falling after 2008. This plot gives the overall view of the gaming industry throughout history.
PC and PlayStation series are the most consistent platform in term of number of games. If gamers want to have many gaming options available, PC and PlayStation are their choice.
Throughout the years, most of the mean score for each genre are quite constant. Except those related to RPG, they are on the rise. While those involved action are on decline.
This dataset is about game rating from ign.com, a famous game website. It involves over 18000 game from 1996 to 2016. It spans most of the gaming platform and game genre available in this period. I start exploring this dataset by plotting frequancy of each variables. By doing this, I got the overall understanding of this dataset. Trend in gaming industry is understood in this period of investigation. Is is peaked in 2008 and has been declining since then. Next I start comparing two different variables bt mean of scatter plot, line plot, stacked bar plot and box plot. Evolution of game score, platform and genre are investigated in this period. Lastly I plot multivariable graph to examine three variables simultaneously. By exploring this dataset, trending in genre can also been seen. I can also see which platform is transcient and which stand over a test of time.
Univariate and bivariate plots are straighforward to generated and explore. But it is quite hard to create a meaningful multivariable plots. In the future, using more complicated plot such as heat map should give additional insights to the analysis.